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A  Variational  Framework  for  Simultaneous  Motion  Estimation  and  Restoration 

of  Motion-Blurred  Video 


Figure  1.  From  two  real  blurred  frames  (left),  we  automatically  and  simultaneously  estimate  the  motion  region,  the  motion  vector,  and  the 
image  intensity  of  the  foreground  (middle).  Based  on  this  and  the  background  intensity  we  reconstruct  the  two  frames  (right). 
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Abstract 

The  problem  of  motion  estimation  and  restoration  of  ob¬ 
jects  in  a  blurred  video  sequence  is  addressed  in  this  paper. 
Fast  movement  of  the  objects,  together  with  the  aperture 
time  of  the  camera,  result  in  a  motion-blurred  image.  The 
direct  velocity  estimation  from  this  blurred  video  is  inac¬ 
curate.  On  the  other  hand,  an  accurate  estimation  of  the 
velocity  of  the  moving  objects  is  critical  for  restoration  of 
motion-blurred  video.  Therefore,  restoration  needs  accu¬ 
rate  motion  estimation  and  vice  versa,  and  a  joint  process  is 
called  for.  To  address  this  problem  we  derive  a  novel  model 
of  the  blurring  process  and  propose  a  Mumford-Shah  type 
of  variational  framework,  acting  on  consecutive  frames,  for 
joint  object  deblurring  and  velocity  estimation.  The  pro¬ 
posed  procedure  distinguishes  between  the  moving  object 
and  the  background  and  is  accurate  also  close  to  the  bound¬ 
ary  of  the  moving  object.  Experimental  results  both  on  sim¬ 
ulated  and  real  data  show  the  importance  of  this  joint  esti¬ 
mation  and  its  superior  performance  when  compared  to  the 
independent  estimation  of  motion  and  restoration. 
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1.  Introduction 

Motion  estimation,  that  is,  the  computation  of  the  veloc¬ 
ity  of  moving  objects  in  a  given  image  sequence,  is  a  well 
known  problem  in  image  processing  and  has  received  sig¬ 
nificant  attention  in  recent  years.  Optical  flow  computation 
is  one  example  of  a  widely  used  approach  to  motion  estima¬ 
tion.  Numerous  methods  have  been  developed  to  determine 
this  flow,  e.g.,  [10,  24].  One  commonly  known  fact  is  that 
the  clearer  the  sequence  is,  the  more  reliable  the  motion  can 
be  estimated.  While  certain  robustness  has  been  addressed 
in  motion  estimation,  e.g.,  under  varying  illumination,  [13], 
and  contrast,  [4],  simple  observation  of  the  state-of-the-art 
literature  in  the  subject  immediately  reveals  that  the  videos 
are  quite  sharp  and  in  general  of  sufficiently  high  quality.  In 
particular,  blurred  video,  see  below,  is  very  seldom  consid¬ 
ered  in  motion  estimation  techniques. 

There  are  many  real  world  effects  on  video  footage 
which  make  motion  estimation  more  difficult.  In  this  pa¬ 
per,  we  address  how  to  handle  one  of  these  critical  effects. 
Considering  video  footage  from  a  standard  video  camera,  it 
is  quite  noticeable  that  relatively  fast  moving  objects  appear 


blurred  (cf.  Fig.  1).  This  effect  is  called  motion  blur ,  and  it 
is  caused  by  the  way  a  camera  takes  pictures  and  is  linked 
to  the  aperture  time  of  the  camera,  which  roughly  integrates 
information  in  time.  The  longer  the  aperture  is  open,  or  the 
faster  the  motion,  the  blunder  moving  objects  appear. 

To  improve  the  accuracy  of  the  motion  estimation  on  a 
video  suffering  from  motion  blur,  it  would  be  helpful  to  re¬ 
move  the  motion  blur  first.  On  the  other  hand,  if  the  ac¬ 
tual  motion  is  known,  the  motion  blur  can  be  removed  by 
“deconvolution,”  since  the  motion  gives  the  velocity  of  the 
objects  and  therefore  the  exact  kernel  needed  for  deconvo¬ 
lution.  Realizing  that  these  two  problems  are  intertwined 
suggests  to  develop  a  method  to  tackle  both  problems  at 
once. 

In  this  paper  we  introduce  a  variational  method  which 
jointly  handles  motion  estimation,  moving  object  detection, 
and  motion  blur  deconvolution  (cf.  Fig.  1).  The  pro¬ 
posed  framework  is  a  Mumford-Shah  type  of  convex  vari¬ 
ational  formulation,  which  includes  explicit  modelling  of 
the  motion-blur  process  as  well  as  shape  and  image  regular¬ 
ization  terms,  and  is  solved  via  efficient  regularized  decent 
techniques.  The  input  to  the  variational  formulation  are  two 
consecutive  frames,  while  the  output  are  the  corresponding 
reconstructed  frames,  the  segmented  moving  object,  and  the 
actual  motion  velocity.  As  demonstrated  in  this  paper,  this 
joint  estimation  of  motion,  moving  object  region,  and  re¬ 
constructed  images,  outperforms  techniques  where  each  in¬ 
dividual  unknown  is  individually  handled. 

Before  proceeding  with  the  explicit  description  of  the 
proposed  framework,  let  us  illustrate  this  last  point.  For 
this,  we  use  the  image  in  Fig.  2,  which  although  artificial, 
is  very  challenging  and  appropriate  to  demonstrate  the  ad¬ 
vantage  of  joint  estimation.  In  this  figure,  the  Einstein  in¬ 
sert  f0bj  is  moving  (velocity  vector  v  =  (6,  7)),  while  the 
Lena  background  fog  is  fixed.  The  independently  computed 
velocity  from  the  blurred  frames  leads  to  an  inaccurate  es¬ 
timate  of  v  =  (5.78,  6.80)  and  of  the  moving  region  (level- 
set  of  o),  which  results  in  non- satisfactory  restoration  of 
the  blurred  frames  (first  image  in  second  row  of  Fig.  3,  see 
also  Fig.  7).  With  our  proposed  joint  technique,  we  ob¬ 
tained  v  =  (5.98,  7.009),  and  both  the  frames  (last  row  of 
Fig.  2)  and  the  moving  region  (blue  curve,  level  line  of  </>  in 
middle  row  of  Fig.  2)  are  accurately  recovered. 

The  remainder  of  this  paper  is  organized  as  follows.  Af¬ 
ter  briefly  presenting  the  related  literature  and  a  resume  of 
our  key  contributions,  we  describe  the  motion  model  in  Sec. 
2  and  derive  our  variational  formulation  in  Sec.  3.  Then,  in 
Sec.  4  results  of  the  joint  approach  are  discussed.  Section  5 
is  devoted  to  a  detailed  description  of  the  energy  minimiza¬ 
tion  algorithm  and  in  Sec.  6  we  draw  conclusion  and  give  an 
outlook.  The  appendix  contains  a  comprehensive  collection 
of  gradient  components  required  in  the  algorithm. 


Figure  2.  Results  on  an  artificial  motion  blur  sequence  showing 
a  square  with  a  picture  of  Einstein  moving  on  the  Lena  image  as 
background.  The  input  images  gi  and  #2  (top),  the  recovered  ob¬ 
ject  intensity  /obj,  the  initial  boundary  contour  of  the  object  (red) 
and  the  computed  contour  (blue)  (middle  row),  and  finally  the  re¬ 
covered  frames  fi  and  (bottom)  are  depicted. 


|jgy 

Figure  3.  For  the  example  from  Fig.  2,  intermediate  results  from 
our  algorithm  are  depicted.  In  the  top  row  from  left  to  right  the 
object  contour  is  shown  for  three  iterations  from  the  initialization 
phase  based  on  motion  competition  without  deblurring.  On  the 
bottom  three  follow-up  iterations  of  the  joint  method  including 
the  restoration  of  the  frame  are  depicted. 


1.1.  Related  works  and  key  contributions 

There  exist  numerous  methods  to  remove  motion  blur 
using  a  single  frame,1,  and  these  often  introduce  strong  as¬ 
sumptions  on  the  scene  and/or  blur  [8].  As  an  example, 
let  us  mention  the  recent  contribution  on  blind  motion  de¬ 
blurring  using  image  statistics  presented  in  [17],  were  the 
author  explains,  as  clear  from  the  results,  that  while  the 
image  often  well  recovered,  the  actual  motion  and  region 
of  movement  are  often  quite  non-accurate.  Another  recent 
approach  to  motion  deblurring  [11]  uses  blending  with  the 
background  but  assumes  the  shift-variant  case.  Further  [2] 
tackles  piecewise  shift-variant  deblurring,  including  a  seg¬ 
mentation  of  the  blurred  regions.  Of  more  interest  to  our 
approach  are  techniques  that  use  multiple  frames,  and  these 
(some  of  them  hardware  based)  are  only  very  few,  as  sum¬ 
marized  in  [8].  More  on  the  close  connection  between  our 
work  and  [<  ]  will  be  presented  below. 

Sequential  motion  estimation  and  then  deblurring  has 
been  reported  in  [16]  (see  also  [19]),  while  not  address¬ 
ing  a  truly  joint  estimation.  The  idea  of  developing  joint 
methods  for  intertwined  problems  has  become  quite  popu¬ 
lar  and  successful  recently,  for  example  blind  deconvolution 
and  denoising  [9],  segmentation  of  moving  objects  in  front 
of  a  still  background  and  the  computation  of  the  motion  ve¬ 
locities  [14],  segmentation  and  registration  using  geodesic 
active  contours  [12,  23],  anisotropic  classification  and  car¬ 
toon  extraction  [3],  and  optical  flow  computation  and  video 
denoising  [18]. 

Motion  deblurring  can  also  be  obtained  with  the  so 
called  “super-resolution  framework,”  see  [2  ]  and  refer¬ 
ences  therein.  The  basic  idea  behind  these  approaches, 
which  often  assume  that  the  blurring  kernel  is  provided, 
is  to  obtain  a  higher  resolution  image  from  a  collection  of 
low-resolution  frames.  In  addition,  these  techniques  often 
assume  that  the  whole  frame  suffers  motion  blur  (or  attack 
this  with  robust  norms),  and  do  not  explicitly  separate  the 
moving  object  from  the  background  or  estimate  the  motion 
velocity. 

The  pioneering  work  by  Favaro  and  Soatto,  [8],  is  the 
closest  to  ours,  not  only  because  of  the  use  of  multiple 
frames  but  also  because  of  the  joint  estimation.  In  a  sep¬ 
arate  paper,  they  also  [  ]  address  the  problem  of  simultane¬ 
ously  inferring  the  depth  map,  radiance  and  motion,  from 
motion  blurred  and  defocused.  Thus,  these  works  address 
the  same  challenges  as  we  do  here,  which  is  the  joint  estima¬ 
tion  of  motion  and  scene  deblurring  from  multiple  frames. 
Some  differences  are  that  the  authors  of  [8]  approximate  the 
motion  blur  with  a  Gaussian,  rather  than  the  more  accurate 
rectangular  filter,  described  in  the  next  section.  This  model 
leads  them  to  an  anisotropic  diffusion  flow,  and  inverting 


Similarly,  the  literature  on  motion  estimation  is  abundant.  Here  we 
concentrate  only  on  works  addressing  blurred  video. 


it  is  ill  posed.  On  the  other  hand,  the  variational  formula¬ 
tion  we  propose  here  is  well-posed  and  convex.  The  model 
in  [8]  is  designed  to  handle  only  very  little  blur  (motion), 
while  the  proposed  method,  as  illustrated  by  the  real  ex¬ 
amples  below,  can  handle  large  velocities  and  blurs.  We 
also  model  the  crucial  blending  of  the  foreground  and  back¬ 
ground,  which  happens  in  reality  and  significantly  effects 
the  blur  as  well  as  the  reconstruction  near  the  boundary  of 
the  moving  object  (see  examples  in  Fig.  2,5,6).  Finally,  we 
note  that  while  the  proposed  formulation  could  deal  with 
multiple  moving  objects,  in  this  paper  we  provide  examples 
with  only  one,  whereas  [8]  develop  their  work  for  multiple 
moving  objects —  although  they  present  no  examples  of  this 
capability  with  real  video  data. 

To  recap,  this  paper  addresses  the  very  important  and 
challenging  problem  of  joint  motion  estimation  and  scene 
reconstruction  from  multiple  frames.  This  problem  has  been 
widely  ignored  in  the  literature,  and  ordinary  motion  esti¬ 
mation  techniques  assume  sharp  videos,  while  deblurring 
techniques  often  have  other  not  always  realistic  assump¬ 
tions.  Furthermore,  we  incorporate  a  motion  blur  model 
which  is  consistent  at  motion  singularities.  The  important 
differences  with  the  only  closely  related  method,  proposed 
[8],  are  detailed  above. 

2.  Modeling  the  blurring  process 

Images  from  an  image  sequences  captured  with  a  video 
camera  are  integrated  measurements  of  light  intensity  emit¬ 
ted  from  moving  objects  over  the  aperture  time  interval  of 
the  camera.  Let  /  :  [— T,  T]  x  £2;  (t,  x)  i— >  M  denote  a  con¬ 
tinuous  sequence  of  scene  intensities  over  a  time  interval 
[— T,  T]  and  on  a  spatial  image  domain  D  observed  via  the 
camera  lens.  The  video  sequence  recorded  with  the  cam¬ 
era  consists  of  a  set  of  images  gi  :  Cl  —>  M  associated  with 
times  U,  for  i  =  1,  •  •  •  ,  m,  given  as  the  convolution 

l  ru+hr 

9i(x)  =  -  f(t  +  s,x)ds  (1) 

T  Jti~ 

over  the  aperture  time  r.  For  the  time  integral,  we  propose 
a  box  filter,  which  realistically  approximates  the  mechani¬ 
cal  shutters  of  film  cameras  and  the  electronic  read  out  of 
modem  CCD  video  recorders.  In  the  simplest  case,  where 
the  sequence  /  renders  an  object  moving  at  constant  veloc¬ 
ity  v  G  M2,  i.e.  f(x  —  sv )  =  f(t  +  s,x)9  we  can  transform 
integration  in  time  to  an  integration  in  space  and  obtain  for 
the  recorded  images 

9i(x)  =  -[  f(x-sv)ds  =  (f*hv)(x),  (2) 

T  J-k-r 

for  a  one  dimensional  filter  kernel  hv  =s  So  ( ^  •  y)  h{  A  •  y) 
with  filter  width  r\v\  in  the  direction  of  the  motion  trajec¬ 
tory  {y  =  x  +  sv  :  s  G  M}.  Here  denotes  v  rotated  by 


intensity  function 


Figure  4.  We  consider  a  moving  circle  with  back  and  white  stripes 
in  front  of  a  similarly  textured  background.  For  this  test  case  a 
comparison  is  shown  between  the  wrong  (left)  motion  blur  model 
which  ignores  the  motion  discontinuity  at  the  boundary  and  our 
realistic,  consistent  model  (right)  given  in  (4). 


Figure  5.  Given  two  frames  for  the  realistic  motion  blur  showing 
the  moving  circle  on  the  texture  background  from  Fig.  4  (left), 
computational  results  for  the  deblurring  are  depicted  based  on  the 
wrong  motion  blur  model  built  into  Gi  (middle),  and  on  our  con¬ 
sistent  model  (right).  This  clearly  outlines  the  importance  of  a 
proper  handling  of  the  motion  discontinuity  in  the  considered  mo¬ 
tion  blur  model. 


90  degrees,  So  is  the  usual  ID  Dirac  distribution  and  h  the 
ID  block  filter  with  h(s)  =  for  s  E  [— ^-]  and 
h(s)  =  0,  else.  In  case  of  an  object  moving  in  front  of  a 
(still)  background  the  situation  is  somewhat  more  compli¬ 
cated.  At  a  point  x  close  to  the  boundary  of  the  object,  the 
convolution  (1)  decomposes  into  a  spatial  convolution  of 
object  intensities  along  the  motion  path  for  the  sub-interval 
of  the  aperture  interval  where  the  object  covers  the  back¬ 
ground  at  position  x,  and  a  retrieval  of  the  background  in¬ 
tensity  for  the  remaining  opening  time  of  the  lens.  Figure  4 
shows  a  comparison  between  the  actually  observed  motion 
blur  and  results  obtained  by  a  (wrongly)  direct  application 
of  the  spatial  convolution  formula  (2)  on  a  moving  circular 
object  in  front  of  a  textured  background  (more  specifics  on 
this  below).  This  observation  is  particularly  important  for 
the  reliable  recovery  of  boundaries  of  moving  objects  from 
recorded  video  frames  gi  and  subsequently  for  the  proper 
restoration  of  image  frames  (cf.  Fig.  5  for  a  corresponding 
comparison). 

In  what  follows  we  consider  an  object  moving  with  speed 
v  E  M2  in  front  of  a  still  background  /bg  :  Q  — >  M  (which 
simplifies  the  formulation — see  Sec.  6  for  remarks  on  the 
generalization).  The  object  at  time  0  is  represented  by  a 
intensity  function  /obj  :  £20bj  — >  M  defined  on  an  object  do¬ 
main  fiobj.  From  /obj  and  /bg  one  assembles  the  actual  scene 


f(t,  x)  =  fobj(x  -  tv)x obj  (a;  -  vt)  + 

fbg(x)(l  -  XoV(x  -  vt))  (3) 

at  time  t,  where  xobj  :  M2  — ►  R.  denotes  the  characteristic 
function  of  £20bj.  Now,  inserting  (3)  in  (1)  and  then  using  (2) 
on  Dobj,  we  deduce  the  correct  formula  for  the  theoretically 
observed  motion  blur  at  time  U, 

Gi[flobj,  v,  /obj,  /bg] (x)  :=  ((/objXobj )*hv)(x-tiv)  + 

/bg  (x)  ( 1  -  (Xobj  *  K )  {x  - 1  iv) )  ,(4) 

for  given  object  domain  Dobj,  motion  velocity  v,  and  object 
and  image  (background)  intensity  functions  /obj  and  /bg  re¬ 
spectively.  If  we  do  not  carefully  model  the  observed  inten¬ 
sities  as  the  moving  object  occludes  and  uncovers  the  back¬ 
ground,  we  would  observe  (f(t,  •)  *  hv)  on  the  object  do¬ 
main  and  /bg  elsewhere  (cf.  the  combination  of  Eq.  (14)  and 
Eq.  (3)  in  [8]).  Given  the  more  precise  motion  blur  model 
proposed  here,  we  now  proceed  to  derive  a  variational  for¬ 
mulation  to  simultaneously  estimate  all  parameters  in  this 
equation  based  on  two  consecutive  frames. 

3.  A  Mumford-Shah  model 

Given  two  frames  g\  and  g 2  of  a  video  sequence  with 
motion  blur  recorded  at  times  t\  and  £2,  respectively,  we 
construct  a  variational  model  to  extract  from  these  frames 
the  domain  £20bj,  the  image  intensity  /obj  of  a  moving  object, 
and  the  motion  velocity  v.  Here,  we  propose  that  the  back¬ 
ground  intensity  /bg  can  a  priori  be  extracted  from  the  video 
sequence,  for  example,  by  averaging  pixels  with  stable  val¬ 
ues  over  a  sequence  of  frames.  The  formulation  generalizes 
easily  to  include  this  estimation,  as  described  in  Sec.  2.  We 
aim  at  formulating  a  joint  energy  for  these  degrees  of  free¬ 
dom.  Modeling  this  energy  we  take  into  account  the  follow¬ 
ing  observations: 

•  Given  v  and  intensity  maps  /obj,  /bg  :  Q  — >  R  (ex¬ 
tended  on  the  whole  domain  in  a  suitable  way),  we  phrase 
the  identification  problem  of  the  object  boundary  90obj  in 
terms  of  a  piecewise  constant  Mumford-Shah  model.  This 
appears  to  be  well- suited  in  particular  because  the  un¬ 
known  contour  is  significantly  smeared  out  due  to  the  mo¬ 
tion  blur.  Hence,  a  comparison  of  the  expected  motion 
blur  Gi  with  the  observed  time  frames  gi  in  a  least  square 
sense  fQ  obj,v,  /obj,  /bg]  -  gt)2  dx  is  considered  as  the 
fidelity  energy,  where  the  length  of  the  boundary  contour 
\dflohi\  acts  as  the  corresponding  prior. 

•  For  known  v  and  Dobj,  we  obtain  an  almost  classi¬ 
cal  deblurring  problem  for  /obj  with  the  modification  of  the 
blurring  kernel  given  in  (3),  which  is  already  reflected  in 
the  above  fidelity  term.  We  expect  /obj  to  be  characterized 
by  edges  (cf.  Fig.  1,  2,  and  6  ).  As  a  suitable  prior  for 


Figure  6.  The  performance  of  the  joint  model  is  shown  in  case  of 
2  consecutive  images  from  an  artificially  blurred  plane  sequence. 
The  input  images  g±  and  g<i  (top),  the  recovered  object  intensity 
/obj,  the  zero  contour  of  the  level  set  function  (j)  at  two  different 
relaxation  step  (in  red  and  blue)  of  the  algorithm  (middle  row), 
and  finally  the  recovered  frames  fi  and  (bottom)  are  displayed. 


Figure  7.  A  comparison  of  our  joint  method  with  a  non-joint 
method  and  with  a  method  not  taking  into  account  the  consistent 
motion  blur  model  is  shown.  A  restored  frame  with  two  zoom  up 
areas  is  depicted  for  a  straightforward  scale  variant  motion  deblur¬ 
ring,  where  the  contour  is  extracted  a  priori  based  on  pure  motion 
competition  (left),  for  the  non-consistent  motion  blur  model  on 
the  same  a  priori  computed  contour  (middle),  and  for  the  fully 
joint  method  with  the  consistent  model  (right). 


these  intensity  maps  we  select  the  total  variation  functional 
fn  I  V/obj  |  dx  [20],  which  at  the  same  time  guarantees  a  suit¬ 
able  extension  onto  the  whole  space  (cf.  Fig.  2  for  an  ex¬ 
ample  of  the  object  intensity  /obj  which  is  extended  in  a  total 
variation  consistent  way  to  a  neighborhood  of  the  object  do¬ 
main  f70bj). 

•  Finally,  given  f70bj  and  the  two  intensities  /obj,  /bg,  the 
extraction  of  the  motion  velocity  v  is  primarily  an  optical 
flow  problem.  The  transport  of  the  object  intensity  /obj  from 
time  t\  to  f 2  described  in  G\  and  G 2  provides  us  with  in¬ 
formation  onr.  In  the  case  of  limited  intensity  modulations 
on  the  moving  object,  it  is  the  comparison  of  the  expected 
transition  profile  xobj  *  hv,  encoded  in  Gi ,  with  the  observed 
profile  in  that  will  act  as  a  guidance  for  the  identification 
of  the  motion  velocity. 

Based  on  these  modeling  aspects  we  finally  obtain  the 
energy 

£[nobj,v,fobj\  -  V  /  (Gi[0obj,'i;,/obj,/bg]  -  gi)2  dx 

i= l,2Ja 

+  [  /i|V/obj|  dx  +  Z'ldftobjl, 

Jn 

(5) 

and  ask  for  a  minimizing  set  of  the  degrees  of  freedom  f70bj, 
v,  and  /obj.  Once  a  minimizer  is  known,  we  can  retrieve  the 
deblurred  images  /(ti,  •)  and  /(^  •)  applying  (3). 

4.  Discussion  of  the  model 

In  this  section  we  validate  the  performance  of  our  vari¬ 
ational  model  and  discuss  results  obtained  for  different  ap¬ 
plications.  Figures  2  and  6  demonstrate  the  model  for  two 
different  test  cases.  In  both  we  see  the  proper  identification 
of  the  moving  object  and  estimation  of  the  motion  velocity. 
We  obtain  an  estimated  velocity  v  =  (9.47,  —0.007)  of  the 


Figure  8.  A  blow  up  of  the  moving  object  from  Fig.  1  is  rendered 
for  an  original  frame  with  motion  blur  (left)  and  for  the  restored 
intensity  (right)  computed  by  our  model. 


plane  in  Fig.  6,  compared  to  the  true  velocity  v  =  (10,  0). 
The  joint  approach  for  all  three  unknowns — the  motion  ve¬ 
locity  v,  the  object  intensity  /obj  and  the  object  domain 
f20bj — turns  out  to  be  crucial  for  a  proper  reconstruction  of 
blurred  video  frames.  This  interdependence  is  demonstrated 
by  the  results  in  Fig.  7  where  we  compare  our  joint  ap¬ 
proach  with  a  two  step  method  which  first  tries  to  identify 
f20bj  and  v  based  on  a  motion  competition  algorithm  [6],  fol¬ 
lowed  by  the  actual  deblurring  in  a  second  step.  Note  that 
the  proposed  method  can  be  regarded  as  a  motion  compe¬ 
tition  method  if  we  skip  the  convolution  with  the  convolu¬ 
tion  kernel  hv  in  the  variational  formulation.  Figure  7  also 
shows  the  importance  of  the  consistent  motion  blur  model 
from  (4)  for  a  proper  reconstruction  in  the  vicinity  of  motion 
singularities.  Finally,  we  have  applied  our  model  to  a  true 
motion  sequence  recorded  with  a  hand  held  video  camera. 
The  sequence  shows  a  toy  car  moving  in  front  of  a  puzzle 
(background).  We  choose  a  textured  object  moving  in  front 
of  a  textured  background  to  demonstrate  the  interplay  be¬ 
tween  the  deblurring  steered  by  the  fidelity  functional  Tb 
(see  Eq.  7)  and  the  reconstruction  of  sharp  edges  due  to 
the  total  variation  built  into  the  prior  J5 .  Results  showing 
the  overall  procedure  of  our  approach  are  also  depicted  in 
Fig.l.  In  Fig.  8  we  render  a  zoom  onto  the  moving  object, 
which  demonstrates  the  interplay  of  the  deblurring  and  the 
edge  reconstruction. 


5.  The  minimization  algorithm 

To  solve  the  minimization  problem  for  the  energy  (5)  we 
consider  that  the  object  domain  Qohi  is  represented  by  the 
zero  super  level  set  {xGH  :  (j){x)  >  0}  of  a  level  set  func¬ 
tion  and  follow  the  approach  proposed  by  Chan  and  Vese 
[5].  The  domain  splitting  into  object  and  background  in  the 
different  energy  terms  is  encoded  via  a  heaviside  function 
H (</>)  with  H(s)  =  1  for  s  >  0,  and  0  elsewhere.  Further¬ 
more,  the  perimeter  of  the  object  domain  can  be  rewritten  as 
the  total  variation  of  H (0),  i.  e.  |<9f20bj  |  =  fn  \  V(H \  dx 
[1].  As  in  [5]  we  consider  a  regularized  heaviside  function 
H$(x)  :=  \  +  ^  arctan  (|)  for  a  scale  parameter  S  >  0. 
Let  us  emphasize  that  the  desired  guidance  of  the  initial 
zero  contour  to  the  actual  object  boundary  relies  on  the  non¬ 
local  support  of  this  regularized  heaviside  function.  Apply¬ 
ing  this  approximation,  we  get  a  regularized  integrand  Gf, 
representing  the  expected  motion  blur  at  time  U : 

GSS,  v,  /obj,  /„]  =((/*  W))  *  hv){-riV) 

+  (fbs(l-HS(ct>)*hv))(-TiV)). 

Finally,  we  obtain  an  approximate  global  energy  consisting 
of  fidelity  term  Tb  and  prior  J5 

£s[<p,  V,  /„bj]  =X5[</>,  V,  /obj]  +  /obj] 

:=  f  (Gi[^v’U]  -9if  dx 

~£,2  Jn  (/) 

+  [  M|V/obj|  +  v\VH$((j))\dx . 

Jn 

This  expression  depends  on  the  motion  vector  v  £  M2  and 
two  scalar,  unknown  functions,  namely  the  level  set  descrip¬ 
tion  (j>  of  the  object  domain  f20bj  and  the  object  intensity  /obj. 
Now,  we  take  into  account  discrete  intensities  for  a  given 
video  frame  resolution  of  n  x  m  pixels.  We  combine  this 
with  a  finite  difference  approximation  of  the  energy,  and  de¬ 
note  by  <t>  and  Fobj  the  corresponding  vectors  of  nodal  values 
in  Mnm  .  In  what  follows,  we  will  outline  an  energy  relax¬ 
ation  method  in  this  already  spatially  discrete  setting  based 
on  an  operator  splitting  with  step  size  control  and  a  regular¬ 
ized  descent  with  respect  to  the  level  set  description. 

Initialization.  At  first,  given  an  initial  contour,  we  select 
and  fix  (in  a  very  rough  approximation  step)  Fobj  as  the 
intensity  values  of  one  of  the  images  g\  and  g 2 .  Then,  we 
relax  the  functional  fQ  J2i=i  2 ~ 9i)2  +  V\S7  Hs((j))\  dx , 
where  Gf  is  obtained  from  Gf  skipping  the  motion  blur 
convolution.  This  initializing  step  can  be  regarded  as 
a  “motion  competition  approach”  (as  in  the  level  set 
formulation  of  [6]),  and  we  obtain  an  initial  contour  <f>°  and 
an  initial  estimate  v°  for  the  motion  velocity.  Now,  fixing 
<F°  and  v°,  a  standard  deblurring  based  on  (2)  is  performed 
on  gi  and  g 2  to  obtain  an  initial  estimate  for  F®. . 


Gradient  descent.  We  examined  experimentally  a  sig¬ 
nificantly  different  roughness  (difference  of  gradient  direc¬ 
tions)  of  the  energy  landscape  associated  with  the  unknowns 
<F,  v,  and  Fobj.  Hence,  an  operator  splitting  strategy  which 
separates  these  directions  and  incorporates  different  time 
steps  for  all  of  them  turns  out  to  be  appropriate.  In  any  sub¬ 
sequent  descent  step  we  pick  up  the  newly  computed  quan¬ 
tities  from  the  same  iteration.  As  step  size  control  we  con¬ 
sider  Armijo’s  rule,  [15],  separately  evaluated  for  all  three 
components.  The  descent  in  the  level  set  description  <f>  of 
the  object  domain  £20bj  requires  a  special  treatment. 

A  point-wise  evaluation  of 
shape  derivatives  (here  given  as 
variations  with  respect  to  the 
level  set  function)  in  the  pres¬ 
ence  of  fine  scale  fluctuation  in 
the  corresponding  integrand  of 
the  shape  functional  (in  our  case 
object  and  background  texture 
blurred  solely  in  the  direction 
of  motion)  is  questionable  (cf. 
Fig.  9,  which  shows  non-smoothness  and  concentration  of 
the  gradient).  Hence,  we  incorporate  a  regularized  gradi¬ 
ent  descent  in  the  level  set  function  inspired  by  the  Sobolev 
active  contour  approach  [22].  It  is  based  on  a  Gaussian  fil¬ 
tering  of  the  descent  direction  (presented  in  the  appendix) 
with  a  filter  Qa  of  width  cr  =  0.005.  Let  us  emphasize  that 
the  resulting  regularized  descent  does  not  affect  the  energy 
landscape  itself,  but  solely  the  descent  path  towards  the  set 
of  minimizers. 

Stopping  criterion.  As  a  stopping  criterion  we  require  the 
offsets  in  all  three  unknowns  computed  in  the  last  time  step 
and  measured  in  the  Euclidean  norm  to  be  bounded  by  a 
threshold  parameter  e.  In  our  implementation  we  have  cho¬ 
sen  e  =  0.01. 

A  plot  of  the  energy  decay  for  the  application  in  Fig.  2 
is  given  in  Fig.  10.  Finally,  let  us  summarize  the  algorithm 
in  pseudo  code  notation: 


Figure  9.  Color  coded  point- 
wise  gradient  grad  5  for 
one  iteration  from  the  appli¬ 
cation  in  Fig.  2. 


EnergyDescent(gi ,  g2)  { 
initialize  <f>°,  v°,  F®. ;  k  =  1; 
do  { 

=AmijoStepSize[f <Ffc]; 

rv  =AmijoStepSize[£^ 5 ,  vk]; 
vk+1=vk  —  Tvgra,dv£6[Qk+]vk,  F^]; 
rF  =  AmijoStepSize[f 1 5,  F^]; 

F*Z  =  F*-TFgra,dFob.£s[&+) v™  F* ] ; 
k  =  k  -|-  1  j 

}  while(||$fe+1  -  $1,  IK1  -v%  ||  >  e) 

} 

For  the  convenience  of  the  reader,  a  comprehensive  col¬ 
lection  of  variations  of  the  different  energy  contributions 


Figure  10.  Plot  of  the  energy  decay  in  the  descent  algorithm. 


comprised  in  the  gradient  vectors  grad^f' 5,  grad^f' 5,  and 
grad Foh]£5  is  given  in  the  Appendix. 

6.  Conclusions  and  outlook 

In  this  work,  we  have  presented  a  Mumford-Shah  type 
variational  formulation  for  joint  motion  estimation  and  de¬ 
blurring  from  video,  which  includes  a  segmentation  of  the 
moving  region.  Following  the  tradition  of  jointly  solving  for 
inter-dependent  unknowns,  we  have  shown  that  this  formu¬ 
lation  outperforms  individual  and  independent  estimates.  In 
particular,  we  present  a  consistent  motion  blur  model  at  mo¬ 
tion  discontinuities  and  demonstrate  it  to  be  essential  for 
proper  deblurring. 

Although  the  presented  framework  is  generic,  it  was 
particularly  addressed  for  single  moving  objects  and  static 
background.  Handling  multiple  objects  can  be  simply  done 
by  having  multiple  unknown  regions  in  the  general  in¬ 
troduced  formulation  (cf.  the  approach  by  Chan  and  Vese 
[5]  for  multiple  segments).  More  elegantly,  and  thereby  also 
permitting  dynamic  background,  we  could  consider  formu¬ 
lations  of  the  type  /(£,  •)%(•)  =  f(t  +  r,  •)%(•  —  tv).  This 
constraint  means  that  the  function  moves  with  the  object, 
and  eliminates  the  need  for  having  independent  fQbj  and 
fbg  functions.  Results  using  this  functional  are  in  progress. 

7.  Appendix 

Here,  we  discuss  the  variation  of  the  different  energy  contribu¬ 
tions  comprised  in  the  gradient  vectors  grades6 ,  gradv£5,  and 
gradFobj  S8  required  to  reproduce  the  gradient  descent  algorithm. 
In  the  case  of  F0bj  and  <f>,  these  gradients  consist  of  the  deriva¬ 
tives  of  the  discrete  energy  with  respect  to  the  nodal  values  $j  and 
(Fohi)j ,  for  j  =  1,  •  •  •  ,  nm.  To  shorten  notation,  we  introduce  the 
residual  term 

ri(x)  :=  2  [g£ [<j>,  v,  /obj,  /bg](x)  -  ^(x)]  . 

First,  we  consider  the  derivative  with  respect  to  v±  and  V2  (v  = 
(ui,  U2)),  and  obtain 

2 

dVjF5  /  [[dj(/objH5  (</>))  *  (kv  -  Tihv)\(x  -  nv) 

i= 1 

-[djHs(cf))  *  (kv  -  nhy)]  ( x  -  Tiv) 

■  fbg(x)^ri(x)  dx, 


where  kv(y)  =  —(y-v)\v\~2hv(y).  Here,  we  rewrite  the  fidelity 
term  T5  in  terms  of  (2),  differentiate  this  and  then  convert  it  back 
to  a  spatial  integral,  instead  of  differentiating  f  *  hv  directly. 

Let  us  remark  that  here  we  do  not  need  to  regularize  the  block 
filter  function  hv  to  be  able  to  calculate  the  variation  of  PF5  with 
respect  to  v.  This  approach  at  the  same  time  leads  to  significantly 
more  stable  results. 

To  deduce  the  derivatives  with  respect  to  the  other  unknowns, 
which  represent  discrete  functions,  we  begin  with  the  first  varia¬ 
tion  of  the  fidelity  functional  and  the  prior  functional  J5  in  the 
direction  of  test  functions  and  discretize  afterwards: 

2  /» 

=  ri(x)((tfHs((j)))  *  hv)(x  -  ^v)  dx, 

e=0  i  =  1 


-^[/obi+O?] 

de 


-7 -FS[<t>  +  t^\ 

de 


=  Yh  f  r*(x) 

=0  i= 


*  hv)(x  -  TiV) 


-  /bg(x)((f^(0)VO  *  hv) (x  -  nv) 


—  J 5  [/obj  +  e$] 

de 


-J^cP  +  e^} 
de 


-  y  [  div  (  ^7^°bj  ^  id  dx , 
Jn  VlV/objl/ 


dx , 


^  / 

Jfi 


iv  (*+-}  Hi 


div 


n  VIV0I/ 


H's&W  dx. 


Here,  we  have  applied  straight  forward  differentiation  and  integra¬ 
tion  by  parts.  To  remove  the  convolution  from  the  test  function  id 
we  use  the  integral  transform 

[  f(x  +  a)(g*h)(x  +  b)dx  =  [  (/a’+  *  hb~)(y)g(y)  dy  , 

Jn  Jn 

where  qb,±(x)  :=  q(±x  +  b).  Now,  choosing  test  functions  con¬ 
centrated  at  nodes  and  evaluated  for  the  spatially  discretized  en¬ 
ergy,  we  finally  obtain 


d 

d(Fobj),- 


Ts 


d 


J6 


2 

i=l 

i= 1 


—  (RiFb g  *  h. 


(-nv) 

v 


—ydiv 


f  ^F0ty(xj)  \ 

vivF0bj(xj)i; 


)(xj)Hl(9(xj)) , 


-vH's(<£(xj))div 


(  ) 


where  R,  (x)  :=  2  [Gf  [$,  v,  Fobj,  Fbe  ](x)  —  gi(x)]  denotes  the 
spatially  discrete  blurring  residual.  Note  that  we  use  standard  dif¬ 
ference  quotients  to  numerically  evaluate  the  derivatives  appearing 
above. 
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